Add signature verification to gossip#1937
Conversation
| /// The network | ||
| pub gossip: CrdsGossip, | ||
| /// set the keypair that will be used to sign crds values generated. It is unset only in tests. | ||
| keypair: Option<Arc<Keypair>>, |
There was a problem hiding this comment.
I don't see the need for the Option here. Instead of None, can you use Pubkey::default()?
There was a problem hiding this comment.
Yeah, Rob was saying the same. I think it makes a little more sense for clarity sake. I'll explain and maybe you can suggest a better solution to the overall problem.
Cluster Info uses the pubkey from the node_info which gets passed in during construction. In fullnodes (most common production usecase), the node_info's pubkey is the same as the keypair that gets passed in optionally(via the new constructor I added).
In nearly all other uses (spy nodes, tests etc) cluster info just carrying around dummy nodes who's pubkeys are no longer associated with any keypair that cluster info can use for signature verification. If I init cluster's keypair to default, it will always fail verification during gossip. The same failure occurs if keypair isn't even initialized since the messages won't be signed and their signatures will not verify. So given that the end result is the same, I chose the Option to make it a little clearer why verification might be failing. Having a valid keypair makes it seem like the node's pubkey is the same as the keypair's but that's not necessary and that's the actual problem here. Cluster info can be constructed with nodes and not have a valid keypair to sign messages with.
Optionally, I can ignore the node's pubkey and just use the keypair always but then that becomes a special case for gossip where gossip messages are always signed with a random keypair whose pubkey doesn't tie into any of the other info blobs (nodes) in the system.
There was a problem hiding this comment.
@sagar-solana have ClusterInfo::new(..) take a keypair and get rid of default?
There was a problem hiding this comment.
@aeyakovenko, yup I can do that but it doesn't solve the problem that most users of cluster info just give it a random node whose keypair is discarded. What keypair do I pass in those cases? In our primary usecase (fullnode's instance of cluster which gets passed to ncp) this isn't an issue but in a whole bunch of places we don't have a keypair to pass in.
Unless its totally fine to expect users of cluster_info to pass in random keypairs knowing that gossip will fail in that cluster_info since that's what will happen with how I have it right now if cluster_info is constructed without a keypair. In that case I'd be more than happy to get rid of default.
There was a problem hiding this comment.
Is it looking like a ton of work to refactor such that all tests are using legit keypairs?
There was a problem hiding this comment.
Pretty much. Most tests don't even care about having gossip run so it would be sort of unnecessary as I see it right now. For now I've done what you asked and removed the option but left in the other constructor. All Fullnodes should work correctly with gossip verification. If anything else tries to run gossip the peers will drop their messages (like spynodes shouldn't be gossiping anyway, only listening) so that actually works out well.
| pub fn is_valid_address(addr: &SocketAddr) -> bool { | ||
| (addr.port() != 0) && Self::is_valid_ip(addr.ip()) | ||
| } | ||
| pub fn get_sign_data(&self) -> Vec<u8> { |
There was a problem hiding this comment.
@sagar-solana can you break this out into a singe signed serialized structure? @garious i am not a huge fan of this pattern from Transaction, because of how error prone it is.
There was a problem hiding this comment.
Yes, Transaction should have a TransactionData struct that includes everything but its signatures.
There was a problem hiding this comment.
Alright I can. I'll end up sticking signature into the Protocol enums instead of into the individual crdsvalues. That way I can just use default serialization for the values. Sounds good? Or should I do what @garious suggested with a data inner struct?
There was a problem hiding this comment.
@sagar-solana the protocol enums wont work because when you receive a list of CrdsValues you need to verify the signature of each one. The protocol messages are the p2p messages, but the values themselves propagate p2N
There was a problem hiding this comment.
Oh that's right! Okay then I need to use the inner data structs to avoid manually serializing. Thanks for pointing that out. Got carried away trying to simplify...
| } | ||
|
|
||
| fn pubkey(&self) -> Pubkey; | ||
| fn get_sign_data(&self) -> Vec<u8>; |
There was a problem hiding this comment.
naming nit... getters don't have get_ on them in our conventions
There was a problem hiding this comment.
I didn't come up with that name. Txs already use it :P Suggestions? Just sign_data (could be confused with "please sign this data")?
There was a problem hiding this comment.
could be "signed_data()" ?
There was a problem hiding this comment.
But the is_signed state of this data isn't known. It's the data that gets signed and compared against for verification. It's not already signed. :(
There was a problem hiding this comment.
signable_data() to match the trait?
There was a problem hiding this comment.
yup, awesome. Thanks!
| use solana_sdk::pubkey::Pubkey; | ||
|
|
||
| ///The min size for bloom filters | ||
| pub const CRDS_GOSSIP_BLOOM_SIZE: usize = 1000; |
There was a problem hiding this comment.
is this in bits? does it make sense for it to be a multiple of 8?
There was a problem hiding this comment.
Not in bits. Num bits to use is figured out using this value (usually network_size is passed in). See https://hur.st/bloomfilter/.
- Assign default to keypair
- Needs tests
- Added a Signable trait that all signable values can implement. - CrdsValue is a Signable that contains Signables so it doesn't need to provide implementations for all the methods and can just override some defaults
- Spy nodes should be able to make pull requests
5f5e1f7 to
bab48f4
Compare
|
@aeyakovenko, if this looks ok to you, I'll merge. |
| /// The Pubkey of the intended node/destination for this message | ||
| pub destination: Pubkey, | ||
| /// The source Addr this message should have been received from | ||
| pub source: SocketAddr, |
There was a problem hiding this comment.
The source shouldn't be necessary, and might not be possible to enforce because of NATs. I would add a wallclock to the message so it expires if its replayed at a later time or by a random node.
There was a problem hiding this comment.
Oh I didn't think of that. The wall clock's a good idea though.
There was a problem hiding this comment.
@aeyakovenko, I picked 500ms as the timeout. Does that sound reasonable since prunes are p2p?
aeyakovenko
left a comment
There was a problem hiding this comment.
500ms might be tight, but we need to tests all the constants in the real world. Could you add a counter for expired prune messages? (A separate pr to track behavior is fine)
Bumps [typescript](https://github.com/Microsoft/TypeScript) from 4.3.3 to 4.3.4. - [Release notes](https://github.com/Microsoft/TypeScript/releases) - [Commits](microsoft/TypeScript@v4.3.3...v4.3.4) --- updated-dependencies: - dependency-name: typescript dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…port of solana-labs#1888) (solana-labs#1900) (solana-labs#1937) Revert "v2.0: Refactor and additional metrics for cost tracking (backport of solana-labs#1888) (solana-labs#1900)" This reverts commit 0aef62e.
Problem
Gossip lacks signature verification for messages, which means data can be spoofed.
Summary of Changes
Added Signatures to replicated gossip values and added verification of values are they get replicated across the network
Fixes #1855 and #1952